Add memory projection cli #273

yuankaichen-amd · 2025-11-07T19:20:58Z

This cli uses *-pretrain.yaml as input and projects memory usage on a worker (default rank=0). It also prints a breakdown of parameter numbers and activation memory usage for each submodule from the model.

Currently it only supports Megatron config files.

Example usage:

NNODES=96 PRIMUS_MODEL=deepseek_proxy_2T PRIMUS_EP=16 PRIMUS_PP=12 bash runner/primus-cli direct --no-gpu --single -- projection memory --config examples/megatron/configs/MI300X/deepseek_v2-pretrain.yaml

Example output:

Total Number of Parameters: 2049.574961 Billion (2,049,574,961,152.0)

[embedding]
Params: 0.822084 Billion (822,083,584)
Activation Memory: 0.0625 GB

[dense_transformer_layer]
Params: 0.453018 Billion (453,017,600.0)
Activation Memory: 0.6250 GB
...........
...........
...........

[Primus:Projection] Memory Projection Summary on Rank 0:
Params: 13.006799 Billion (13,006,798,848.0)
Param+Optimizer Memory: 78.7379 GB
Activation Memory (per batch size 1, seq len 4096): 395.3125 GB
Projected Total Memory: 474.0504 GB

Follow-up work:

(1) consolidate parameters in projection config;
(2) Torchtitan support.

araina-amd · 2025-11-10T23:40:51Z

primus/core/projection/module_profilers/dense_mlp.py

+        total += num_tokens * self.config.model_config.ffn_hidden_size * 2  # bf16
+        # Second Gemm
+        total += num_tokens * self.config.model_config.ffn_hidden_size * 2  # bf16
+        return total


non-swiglu and tensor model parallelism is not taken into account here.

…u mode

yuankaichen-amd requested review from Xiaoming-AMD, limou102 and wenxie-amd as code owners November 7, 2025 19:20

wenxie-amd force-pushed the dev/yk_perf_model branch from 89ba376 to d6d0436 Compare November 10, 2025 03:26

yuankaichen-amd force-pushed the dev/yk_perf_model branch from d6d0436 to 10c106b Compare November 10, 2025 17:43

araina-amd reviewed Nov 10, 2025

View reviewed changes

wenxie-amd and others added 21 commits November 12, 2025 14:26

meta module

53ed56d

update

3200d2d

update

cbae178

update

b91608d

update

e35d24a

update

4a37b8a

update

0e28213

support memory proj cli

10414ba

refactor and build ModelProfiler

eb3417f

filling in memory estimation functions for model components

fb3af9a

update

cae0976

update output with component-wise data

9bfb746

update pipeline calculations, and separate rank/overall results.

cfedd22

separte bench args and bench functions to avoid import torch in no-gp…

6208839

…u mode

delete test config file

65c033f

cleanup

77086dc

fix lint

9b6301f

fix import orders and formats

d77bd0f

Correct the amount of memory for activations with/without swiglu.

f236c1a

fix formatting

148346a

revert to a specific torchtitan version

05da689

wenxie-amd force-pushed the dev/yk_perf_model branch from 17fd561 to 05da689 Compare November 12, 2025 06:26

Xiaoming-AMD approved these changes Nov 12, 2025

View reviewed changes

Xiaoming-AMD merged commit 0d567b8 into main Nov 12, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add memory projection cli #273

Add memory projection cli #273

Uh oh!

yuankaichen-amd commented Nov 7, 2025

Uh oh!

araina-amd Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add memory projection cli #273

Add memory projection cli #273

Uh oh!

Conversation

yuankaichen-amd commented Nov 7, 2025

Uh oh!

araina-amd Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants